Briefings in Bioinformatics
◐ Oxford University Press (OUP)
Preprints posted in the last 30 days, ranked by how well they match Briefings in Bioinformatics's content profile, based on 326 papers previously published here. The average preprint has a 0.25% match score for this journal, so anything above that is already an above-average fit.
Kaira, V. S.; Kudari, Z. D.; P, S. S.; Bhat, R.; G, J.
Show abstract
Drug-target interaction prediction is significant in the hit identification phase of drug discovery, enabling the identification of potential drug candidates for downstream optimization. Traditional computational methods have some drawbacks in their ability to represent 3D structural data for both molecules and target proteins, which is required for the intricate protein-ligand interactions that regulate binding affinity. In this approach, we propose a graph transformer-based model (GTStrDTI) that combines an intragraph attention mechanism with cross-modal attention to enrich the representation of both the drug molecule and target protein. This approach comprehensively models both intramolecular structural features and intermolecular interactions, thereby enhancing binding affinity prediction performance. A thorough evaluation on benchmark datasets such as KIBA, DAVIS, and BindingDB_Kd shows that our approach surpasses the state-of-the-art methods under challenging target cold-start settings. Our analysis found that augmenting graph-based 3D structural protein target (C-alpha contact graphs from PDB with threshold distance of 5[A]) and incorporating molecule adjacency information, boosts predictive performance, thus contributing towards narrowing the gap between computational and experimental research.
Haque, N.; Mazed, A.; Ankhi, J. N.; Uddin, M. J.
Show abstract
Accurate classification of SARS-CoV-2 genomic variants is essential for effective genomic surveillance, yet it is challenged by extreme class imbalance, limited representation of rare variants, and distribution shifts in real-world sequencing data. In this study, we employed hybrid RF-SVM framework designed for robust detection of rare SARS-CoV-2 variants. It integrates a random forest and a polynomial-kernel based support vector machine to enhance sensitivity to minority classes while maintaining overall predictive stability. We systematically compared classical machine learning models, deep learning approaches, and hybrid strategies under both standard and distribution-shifted evaluation settings. Our results show that classical models using TF-IDF-based k-mer features outperform deep learning methods on macro-averaged performance metrics. The Random Forest classifier using TF-IDF Feature achieved the best overall performance, with a macro-averaged F1-score of 0.8894 and an accuracy of 96.3%. The model also demonstrated strong generalization ability, as evidenced by stable cross-validation performance (CV accuracy = 0.9637). Hybrid RF-SVM model further improves rare variant detection under severe class imbalance. Calibration analysis indicates reliable probability estimates for common variants, although challenges persist for minority classes. Overall, this study highlights the limitations of deep learning in highly imbalanced genomic settings and demonstrates that carefully designed hybrid machine learning approaches provide an effective and interpretable solution for rare SARS-CoV-2 variant detection.
Shen, L.; Sun, X.; Zheng, S.; Hashmi, A.; Eriksson, J.; Mustonen, H.; Seppänen, H.; Shen, B.; Li, M.; Vähä-Koskela, M.; Tang, J.
Show abstract
Intratumoral heterogeneity is a major driver of variable drug responses in cancer. Single-cell RNA sequencing (scRNA-seq) enables the characterization of such heterogeneity, providing an opportunity to predict drug response at single-cell resolution. As a result, a growing number of computational models have been developed to infer drug response from scRNA-seq datasets. However, their performance, robustness, and generalizability across different biological contexts have not been systematically evaluated. To address this gap, we conducted a comprehensive benchmarking of representative single-cell drug response prediction models. Using 26 curated datasets comprising over 760,000 cells across 12 cancer types and 21 therapeutic agents, we constructed balanced and imbalanced scenarios to reflect more realistic distributions of drug response labels. To address the lack of ground-truth drug-response labels in conventional scRNA-seq datasets, we further incorporated lineage-tracing data with experimentally validated drug-response annotations, enabling model evaluation in a clinically relevant pre-treatment prediction setting. Our results show that across the tested methods, the prediction performance is markedly higher in cell lines than in tissue samples. Under imbalanced conditions, most methods exhibited sharp performance declines, whereas scDEAL demonstrated the highest robustness. Independent validation using an in-house pancreatic ductal adenocarcinoma dataset further confirms the robustness of scDEAL and its ability to capture biologically meaningful state transitions. Label-substitution experiment revealed that this robust performance partially driven by the models specific training label construction. However, the benchmarking with lineage-tracing data reveals a fundamental limitation: most models capture drug-induced transcriptional changes but struggle to predict a cells intrinsic resistance state prior to treatment. In summary, our study not only defines the performance boundaries of current approaches but also highlights their limitations in addressing intratumoral heterogeneity, extreme class imbalance, and the prediction of intrinsic cellular resistance, emphasizing the need for the development of next-generation single-cell drug response models with stronger clinical relevance.
Han, S.; Sztanka-Toth, T.; Senel, E.; Elnaggar, A.; Patel, J.; Mansi, T.; Smirnov, D.; Greshock, J.; Javidi, A.
Show abstract
Single-cell foundation models enable reusable representations and streamlined analysis workflows, yet rigorous evaluation of their performance and robustness in real-world pharmaceutical settings remain underexplored. Here, we benchmarked leading single-cell foundation models (scGPT; scGPT_CP, a continually pretrained checkpoint of scGPT; scFoundation; scMulan; CellFM) against established baseline methods (scVI; Harmony) for data integration using over 1.5 million cells from clinical and preclinical samples. Performance was assessed using well-established and complementary metrics for technical correction and biological structure preservation. We further introduced robustness-oriented rankings to summarize metric trade-offs and quantify performance consistency across datasets and evaluation settings. Our findings show that fine-tuning improved technical correction performance; among the foundation models, fine-tuned scGPT_CP performed best. However, the baseline scVI was the top overall performer, ranking first by our multi-metric Leximax ranking and achieving the highest Pareto Front-1 hit. Collectively, our study provides practical insights for adapting foundation models to real-world drug design and development.
Wang, T.; Liao, S.; Qi, Y.; Zhang, Z.
Show abstract
Liquid-liquid phase separation (LLPS) underlies the formation of biomolecular liquid condensates (also referred to membraneless organelles, MLOs), which are essential for spatially organizing various biochemical processes within cells. Proteins that play a key role in driving condensates formation are termed phase-separating proteins (PSPs). Given experimental identification of PSPs remains labor-intensive and time-consuming, multiple computational tools have been developed based on empirical features or deep learning. In this study, we propose SSPSPredictor, a novel multimodal predictive model for PSPs with folded or intrinsically disordered structures, leveraging the fusion of sequence information from a protein language model ESM-2 and structural insights from a graph neural network GVP. Compared with existing tools, SSPSPredictor achieves balanced performance in identifying endogenous PSPs, predicting relative LLPS propensities, and recognizing key regions that drive LLPS. Moreover, SSPSPredictor exhibits good interpretability in identifying driving regions along protein sequences, although no relevant supervision was provided during training. Further predictive analysis of the human proteome using SSPSPredictor reveals that the proportion of intrinsically disordered proteins (IDPs) undergoing LLPS is significantly higher than that of folded proteins. In addition, pathogenic variants, especially those located in disordered regions, exhibit higher LLPS propensity than other mutations, uncovering a link between LLPS and diseases at the amino acid level.
Gronning, A. G. B.; Scheele, C.
Show abstract
Peptides are gaining increasing attention as therapeutic agents. Already, peptide-based therapeutics play a key role in the treatment of diverse diseases, including diabetes, obesity, and other complex disorders, and their clinical relevance is expected to expand further in the coming years. Technological and computational advances have substantially enriched peptidomics, massively increasing the scale and depth of peptide identification. As a result, increasingly large and information-rich datasets are now available for downstream analysis and experimental validation. However, the rapid expansion of peptidomics datasets also leads to a corresponding increase in search space, complicating the efficient identification of peptides relevant to specific biological or clinical questions. To address this challenge, we present PepHammer, a lightweight web-based tool for bioactive peptide matching and identification. PepHammer allows users to input up to 10000 peptides (2-150 amino acids in length) and compare them against extensive databases of peptides with predicted or experimentally validated bioactivities and tissue associations using Hamming distance, Grantham distance, as well as partial or exact matching strategies. Via an example study of human milk peptidomics, we demonstrate that PepHammer rapidly provides an overview of the bioactivity and tissue-relational landscape, serving as a starting point for downstream analyses. PepHammer thus enables efficient exploration of large-scale peptidomics datasets and facilitates the identification of biologically relevant peptides.
Chauquet, S.; Jiang, J.-C.; Barker, L. F.; Hunter, Z. L.; Singh, G.; Wray, N. R.; McRae, A. F.; Shah, S.
Show abstract
Drug targets supported by human genetic evidence have significantly higher approval rates, making genome-wide association studies a valuable resource for drug candidate prioritisation. Transcriptome-wide association study signature-matching is an emerging in silico approach that integrates GWAS data with expression quantitative trait loci to generate a disease gene expression signature, which is then compared against drug perturbation databases such as the Connectivity Map. Despite recent adoption, there is no consensus on optimal methodology. Here, we systematically benchmark key parameters, including TWAS method, eQTL tissue model, similarity metric, gene set size, and CMap cell line, using LDL cholesterol, familial combined hyperlipidemia, and asthma as proof-of-concept traits. We demonstrate that while TWAS signature-matching can successfully prioritise known first-line treatments, performance is highly sensitive to parameter choice; for instance, the selection of the cell line used for drug signatures alone can dramatically alter drug prioritisation. Based on these findings, we propose a best-practice framework for robust, genetically-informed drug prioritisation using TWAS signature-matching.
Liu, X.; Kantorow, J.; Chattopadhyay, A. K.; Chakraborty, S.
Show abstract
Experimental structural methods can identify antibody-antigen interfaces with high precision, but they remain time-consuming and resource-intensive, limiting their application across the rapidly expanding space of antibody and antigen sequences. Computational models capable of predicting these interfaces could therefore accelerate antibody discovery and provide insight into the principles governing immune recognition. However, this problem remains challenging due to limited structural datasets, severe class imbalance, and the complex, non-local nature of biomolecular interactions. Here we present VASCIF (Variable-domain Antibody-antigen Structural Complex Interface Finder), a structure-aware framework built on a Masked Graph Attention (MGA) architecture that represents protein complexes as residue graphs and captures long-range structural dependencies through attention-based message passing. The framework is straightforward to implement and enables efficient inference, allowing substantially faster predictions than other existing structure-based approaches. Evaluated on curated structural complexes across multiple benchmark datasets using rigorous cross-validation, VASCIF achieves state-of-the-art performance for residue-level interface prediction. Interpretability analyses reveal that the model recovers biophysically meaningful interaction patterns consistent with known principles of antibody recognition, and redefining interfaces using larger residue distance thresholds ([~]10 [A]) significantly improves predictive performance. Together, VASCIF provides a practical predictive framework and new insights into antibody-antigen molecular recognition.
Garcia, J. J.; Yu, K. M.; Freudenreich, C. H.; Cowen, L.
Show abstract
In Bakers yeast, there exists a comprehensive collection of pairwise epistasis experiments that, for nearly every pair of non-essential genes, measures the growth of the double-knockout strain as compared to its component single knockouts. This data can be represented as a weighted signed graph termed the genetic interaction network, and we introduce a new ILP-based method named GIDEON to search for a diverse collection of Between-Pathway Models (BPMs) in this network, where BPMs are a graph motif signature that indicates potential compensatory pathways in the genetic interaction network. With both an improved distribution-informed edge weighting scheme and an improved ILP method, GIDEON produces BPM collections that are substantially larger and with better functional enrichment compared to previous methods. We find some interesting new BPM gene sets including one with potential insights into antifungal drug targets through ties between ergosterol and aromatic amino acid biosynthesis.
Kierzek, E.; Shabangu, T. S.; Hiltke, O. M.; Miaro, M.; Arteaga, S.; Znosko, B. M.; Jolley, E. A.; Bevilacqua, P. C.; SantaLucia, J.; SantaLucia, H. A.; Lin, H.; Metkar, M.; Aviran, S.; Soszynska-Jozwiak, M.; Kierzek, R.; Mathews, D. H.
Show abstract
Nearest neighbor analysis is commonly used to estimate RNA folding stabilities. In this contribution, we report a set of RNA folding nearest neighbor parameters for estimating free energy change for RNA sequences including 1-methyl-pseudouridine. Development of mRNA vaccines has identified 1-methyl-pseudouridine as a key nucleobase modification for suppressing innate immune responses. However, the contributions of these modifications to RNA folding stability were unclear. Our new parameters provide helical terms for 1-methyl-pseudouridine-adenine and 1-methyl-pseudouridine-guanine base pairs. The parameters also estimate loop stabilities for loops with 1-methyl-pseudouridine or a combination of 1-methyl-pseudouridine and uridine. These parameters are derived using 208 optical melting experiments and tested against an additional 16 optical melting experiments. On average, we find that substitution of uridine with 1-methyl-pseudouridine stabilizes RNA folding, with the extent of stabilization depending on adjacent sequence. The estimation of tRNA folding ensembles for tRNA sequences with 1-methyl-pseudouridine was significantly improved using the new nearest neighbor parameters. The new nearest neighbor parameters are provided as part of the RNAstructure software package. With these parameters, the secondary structures of natural sequences with 1-methyl-pseudouridine and mRNA therapeutics fully substituted with 1-methyl-pseudouridine can be modeled.
Simoes, C. D. M. S.; Maidana, R. L. B. R.; De Assis, S. C.; Guerra, J. V. d. S.; Ribeiro-Filho, H. V.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWThe T cell receptor (TCR) recognition of multiple peptides presented by the major histocompatibility complex (MHC) is a key natural phenomenon, enabling the T cell repertoire to respond to a broad array of antigens. Despite its importance to the immune response, T cell cross-reactivity poses a major challenge for the development of novel T cell-based therapies. In this study, we present MHCXGraph, a graph-based computational approach for identifying conserved and immunologically relevant regions across multiple structures of peptides bound to MHC molecules (pMHC). Our approach provides three operational modes with user-defined parameters, allowing flexible configuration according to specific scientific needs while delivering fully interpretable results through user-friendly interfaces. We evaluated MHCXGraph across three case studies, including peptides bound to classical MHC Class I, MHC Class II, and unbound HLA alleles, demonstrating its ability to capture conserved structural determinants beyond sequence similarity. By integrating structural information with efficient graph-based analysis, MHCXGraph addresses key limitations of sequence-based methods while maintaining computational scalability. Collectively, these results indicate that MHCXGraph can be readily integrated into computational pipelines for T cell cross-reactivity discovery, especially in the context of de novo pMHC engager design and T cell-based vaccine development.
Bokman, E.; Barlam, N.; Babay, O.; Balshayi, Y.; Eliezer, Y.; Zaslaver, A.
Show abstract
High-throughput phenotyping of biological samples is essential for large-scale studies but is frequently bottlenecked by the need for accurate instance segmentation in crowded images. While deep learning offers powerful solutions, the high cost of manual annotation and the requirement for coding expertise often limit adoption in routine laboratory workflows. Here we present SegBio, a lightweight, open-source pipeline that enables end-to-end instance segmentation for non-expert users. The protocol features an interactive annotation GUI that extrapolates full masks from minimal centerline markings, significantly reducing manual labeling effort. It further integrates a configurable U-Net training module and a standalone inference application with a human-in-the-loop editing workflow for rapid and intuitive error correction. We employ the pipeline to annotate and train the model on a novel dataset of crowded C. elegans images. Validated on independent datasets, SegBio achieves high segmentation performance (Panoptic quality [~]0.85) and accurately quantifies per-animal morphology and fluorescence. By eliminating external dependencies and streamlining the correction process, SegBio provides a scalable solution for routine phenotyping that is easily generalized to other crowded biological samples, such as cellular organelles, cells, and organisms.
Chowdhury, T. D.; Shafoyat, M. U.; Hemel, N. H.; Nizam, D.; Sajib, J. H.; Toha, T. I.; Nyeem, T. A.; Farzana, M.; Haque, S. R.; Hasan, M.; Siddiquee, K. N. e. A.; Mannoor, K.
Show abstract
Alzheimers disease remains a major therapeutic challenge, and no {beta}-secretase (BACE1) inhibitor has achieved clinical approval. A key limitation of prior discovery efforts is reliance on single-parameter optimization, often resulting in candidates with limited translational potential. In this study, we developed a biology-informed computational framework integrating meta-ensemble QSAR modeling, molecular docking, Protein Language Model (ESM-1b)-guided residue interaction weighting, and ADMET profiling within a normalized multi-parameter ranking scheme. Model performance was validated using cross-validation, external validation, and Y-randomization (n = 100; p = 0.009), while applicability domain analysis based on Tanimoto similarity highlighted reduced reliability for extrapolative predictions. Sensitivity analysis showed high ranking stability under moderate perturbations (Spearman {rho} = 0.998 for {+/-}10%; 0.963 for {+/-}25%), with reduced agreement under randomized weighting ({rho} = 0.821), indicating that prioritization is robust but influenced by weight selection. Screening of 16,196 compounds identified 153 predicted actives (accuracy = 0.852; ROC-AUC = 0.920), which were refined to 111 candidates and seven prioritized leads. Molecular dynamics simulations (200 ns) indicated stable binding and persistent catalytic interactions, with Mol-2 showing favorable dynamic stability and ADMET characteristics. Overall, this study presents an interpretable and quantitatively evaluated framework for multi-parameter compound prioritization, supporting more reliable virtual screening in early-stage CNS drug discovery.
Overmann, M.; Grabert, G.; Kacprowski, T.
Show abstract
BackgroundGene expression profiling is widely used to investigate disease mechanisms, but classical approaches such as differential expression or pairwise correlation analyses provide limited interpretability. Network-based differential co-expression methods that model conditional dependencies through partial correlations offer richer insights, yet their application in high-dimensional settings requires estimation of precision matrices. Numerous precision matrix estimation methods (PMEMs) have been proposed, but their relative performance under various conditions remains unclear. ResultsSimulated gene expression datasets with known ground truth correlation structures were used to benchmark a broad set of PMEMs. Performance was strongly affected by data characteristics, including covariance structure, matrix density, covariance values, sample size-to-dimension ratio, and sampling distribution. Among the evaluated methods, GLassoElnetFast consistently showed the highest accuracy in recovering differential edges, although high signal-to-noise ratios and sufficient sample sizes remain essential for reliable inference. ConclusionsEvaluation across diverse simulation conditions demonstrated that no single metric or condition was sufficient to assess PMEM performance. Therefore, previous less extensive evaluations risked misleading conclusions. Our simulation and benchmarking framework supports future method development and ensures reproducible evaluation of newly developed approaches.
Wu, R.; Mao, L.; Diao, Y.; Li, H.
Show abstract
Drafting Markush claims for chemical patents remains difficult because manual claim writing is slow, error prone, and often fails to capture related chemical space in a systematic manner. We developed SpaceExpander, a computational method that converts disclosed compounds into generalized Markush claims by extracting core scaffolds, defining variable positions, decomposing complex substituents, and expanding substituent space through fragment matching. We evaluated the method on 24 publicly available chemical patents and compared its performance with IntelliPatent. SpaceExpander achieved a mean atom level scaffold accuracy of 0.92 and exactly recovered the reference scaffold in 19 of 24 patents. By contrast, IntelliPatent could process only 2 patents from the same set, indicating more limited applicability to structurally diverse cases. We further examined practical claim coverage in a case study based on the Osimertinib patent. Using representative disclosed compounds as input, SpaceExpander drafted a Markush claim that covered 5 of 7 additional approved third-generation EGFR inhibitors beyond Osimertinib. These results show that SpaceExpander is a validated method for automated Markush claim drafting and chemical space expansion.
Fletcher, W. L.; Sinha, S.
Show abstract
The practices of identifying biomarkers and developing prognostic models using genomic data has become increasingly prevalent. Such data often features characteristics that make these practices difficult, namely high dimensionality, correlations between predictors, and sparsity. Many modern methods have been developed to address these problematic characteristics while performing feature selection and prognostic modeling, but a large-scale comparison of their performances in these tasks on diverse right-censored time to event data (aka survival time data) is much needed. We have compiled many existing methods, including some machine learning methods, several which have performed well in previous benchmarks, primarily for comparison in regards to variable selection capability, and secondarily for survival time prediction on many synthetic datasets with varying levels of sparsity, correlation between predictors, and signal strength of informative predictors. For illustration, we have also performed multiple analyses on a publicly available and widely used cancer cohort from The Cancer Genome Atlas using these methods. We evaluated the methods through extensive simulation studies in terms of the false discovery rate, F1-score, concordance index, Brier score, root mean square error, and computation time. Of the methods compared, CoxBoost and the Adaptive LASSO performed well in all metrics, and the LASSO and elastic net excelled when evaluating concordance index and F1-score. The Benjamini-Hoschberg and q-value procedures showed volatile performances in controlling the false discovery rate. Some methods performances were greatly affected by differences in the data characteristics. With our extensive numerical study, we have identified the best performing methods for a plethora of data characteristics using informative metrics. This will help cancer researchers in choosing the best approach for their needs when working with genomic data.
Liu, T.; Jiang, S.; Zhang, F.; Sun, K.; Head-Gordon, T.; Zhao, H.
Show abstract
Large language models (LLMs) are in the ascendancy for research in drug discovery, offering unprecedented opportunities to reshape drug research by accelerating hypothesis generation, optimizing candidate prioritization, and enabling more scalable and cost-effective drug discovery pipelines. However there is currently a lack of objective assessments of LLM performance to ascertain their advantages and limitations over traditional drug discovery platforms. To tackle this emergent problem, we have developed DrugPlayGround, a framework to evaluate and benchmark LLM performance for generating meaningful text-based descriptions of physiochemical drug characteristics, drug synergism, drug-protein interactions, and the physiological response to perturbations introduced by drug molecules. Moreover, DrugPlayGround is designed to work with domain experts to provide detailed explanations for justifying the predictions of LLMs, thereby testing LLMs for chemical and biological reasoning capabilities to push their greater use at the frontier of drug discovery at all of its stages.
Bhati, U.; Gupta, S.; kesarwani, V.; Shankar, R.
Show abstract
Protein-protein interactions (PPIs) are molecular lego which define the physical states of cells. Accurately identifying PPIs remains challenging due to the interplay of several factors ranging from electrostatic to molecular geometry, topology, and physics. Existing computational approaches capture only fragments of this orchestra, limiting their generalizability across protein families and interaction types. Here, we present ProMaya, a hierarchical multi-scale Graph-transformer framework that integrates 3D atomic geometry, electronic distribution, residue-level structure and disorder, surface mass-density signatures, and large protein language-model embeddings of interacting proteins. Highly comprehensively benchmarked across nine species and 47 GB experimentally validated data, ProMaya achieved consistently >95% average accuracy, outperforming state-of-the-art tools by >12%. As driven by its explainability, the first time introduced atomic and protein language information dramatically boosted it to an outstanding level for PPI discovery in any species, potent to even bypass costly experiments. ProMaya system is freely accessible at https://scbb.ihbt.res.in/ProMaya/
Duarte, S. A.; Mehdiabadi, M.; Bugnon, L. A.; Aspromonte, M. C.; Piovesan, D.; Milone, D. H.; Tosatto, S.; Stegmayer, G.
Show abstract
Intrinsically disordered proteins (IDPs) play an important role in a wide range of biological functions and are linked to several diseases. Due to technical difficulties and the high cost of experimental determination of disorder in proteins, combined with the exponential increase of unannotated protein sequences, the development of computational methods for disorder prediction became an active area of research in the last few decades. In this work, we present emb2dis, a deep learning model that uses protein language models (pLMs) to predict disorder from sequence. The emb2dis tool is a pre-trained model that receives as input a protein sequence, calculates its pLM embedding and passes it to a deep learning model. In contrast to existing approaches, emb2dis integrates informative sequence representations with a novel architecture that combines residual networks (ResNets) and dilated convolutions. This design effectively enlarges the receptive field of the convolution operation, enabling the model to better capture an extended context of each amino acid. At the output, emb2dis assigns a disorder propensity score to each residue in the sequence. The model was evaluated on datasets from the latest CAID3 blind benchmark for disorder prediction, where it achieved first place in the Disorder-PDB category, exhibiting strong performance with high AUC and Fmax scores. Additionally, it ranked among the top ten methods on the Disorder-NOX dataset. We provide a freely available web-demo for emb2dis and a source code repository for local installation. Weblink for the toolhttps://sinc.unl.edu.ar/web-demo/emb2dis/ The importance of the emb2dis tool is that it provides a new deep learning approach and significant improvements in the prediction of protein disorder, with a simple web interface and graphical output detailing per-residue disorder.
Jones, D.; Wu, Y.
Show abstract
Intrinsically disordered proteins (IDPs) mediate many cellular functions through interactions with structured protein partners, but predicting the corresponding binding sites on the structured partner remains challenging. Here, we present IDBSpred, a sequence-based method for residue-level prediction of IDP-binding sites on structured proteins. Training and test data were collected from the DIBS database, which contains more than 700 non-redundant IDP-protein complexes. Residue-level embeddings of structured partner sequences were generated using the ESM-2 protein language model and used as input to a multilayer perceptron classifier for binary prediction of binding versus non-binding residues. Analysis of amino acid composition showed that IDP-binding sites are enriched in aromatic residues, especially Trp, Tyr, and Phe, as well as several charged and polar residues, whereas Ala and several small or conformationally restrictive residues are depleted. The classifier achieved an ROC AUC of 0.87 and an average precision of 0.61. Structural case studies further showed that the predicted sites largely recapitulate the major experimentally defined binding interfaces. These results demonstrate that protein language model embeddings plus machine learning algorithms can effectively capture sequence features associated with IDP recognition on structured proteins. IDBSpred provides a practical framework for studying IDP-mediated interfaces and identifying potential therapeutic hotspots.